This tutorial covers:
Downloading flow data from USGS gauges
“Wrangling” the data into a more flexible format
Calculating basic hydrograph metrics
Flow Duration Curves
Annual Flood Frequency
Empirical return intervals
Estimating return intervals
Calculating some of the IHA metrics
You will need to download the:
* dataRetrieval and tidyverse packages
* You only need to do this once per machine
* copy the following code into your console and run it (press
enter)
* Do NOT put this code into your script
install.packages("dataRetrieval")
install.packages("tidyverse")
Now that we have the packages installed, we can start a new script (white square with a green plus sign icon in the top left) and begin our analysis.
library(dataRetrieval)
library(tidyverse)
We will use the dataRetrieval package to download
data
For a more complete tutorial on the dataRetrival
package, see this
website
Here, we will download data from the USGS gauge on the Colorado River near the UT-CO stateline
Save variables to make download easier
siteNo <- "USGS-09163500" #CO River near CO-UT stateline
# you will need to look up a site code for a different gauge for your assignment
pCode <- "00060" #discharge
# all daily data from 2003-2023
start.date <- "2003-10-01"
end.date <- "2023-12-31"
# For your assignment, make sure there is at least 10 yers of data, and change the dates as appropriate
read_waterdata_daily() function to download
the data, and save it as an object called co_rco_r <- read_waterdata_daily(
monitoring_location_id = siteNo,
parameter_code = pCode,
time = c(start.date, end.date))
## Requesting:
## https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily/items?f=json&lang=en-US&limit=10000&monitoring_location_id=USGS-09163500¶meter_code=00060&time=2003-10-01%2F2023-12-31
# just first few rows:
co_r
# statistic_id == 00003 = daily
View() the whole thing with this
code:View(co_r)
time : In this case, the date in YYYY-MM-DD
formatvalue which is the mean discharge (Q) for that day
qunit_of_measure column, bit it’s
always CFS, so we will just have to remember that# remove "geometry" attributes
attr(co_r, "class") <- "data.frame"
# overwrite co_r to just be the two columns
# and rename value column to be "q"
co_r <- co_r |>
select(time, value)|>
rename(q = value)
co_r
Recall that a FDC is calculated as:
\[ \text{% Exceedance} = \frac{m}{N+1}*100 \]
where \(m\) is the rank of the flow, and \(N\) is the number of observations (number of days) in the data
daily_pe <- co_r |>
# sort by the q column
arrange(-q) |>
# add a "rank" column
mutate(m_rank = 1:n()) |>
# calculate % exceedance
mutate(PE = (m_rank / (n() +1)) *100)
# print out the new data object
daily_pe
ggplot(daily_pe,
aes(x = PE,
y = q)) +
geom_line()
* FDC’s are usually plotted with a logarithmic y- and x-axes
ggplot(daily_pe,
aes(x = PE,
y = q)) +
geom_line() +
scale_y_log10() +
scale_x_log10()
* we can also add a title and axis labels for clarity
ggplot(daily_pe,
aes(x = PE,
y = q)) +
geom_line() +
scale_y_log10() +
scale_x_log10() +
labs(title = "Daily Flow Duration Curve",
subtitle = "CO River near CO-UT stateline",
y = "Q (CFS)",
x = "Percent Exceedance")
daily_pe object to see what flow equates
to a specific percent exceedance# 10% exceedance?
daily_pe |>
filter(PE == 10)
daily_pe |>
filter(PE >= 9.9,
PE <= 10.1)
filter comanddaily_pe |>
filter(PE > 9.99,
PE < 10.01)
so, the flow in the Colorado river exceeded or met 11400 CFS only
10% of the time
In other words, a flow of 11400 CFS was relatively rare
What are common flow values?
daily_pe |>
filter(PE > 74.99,
PE < 75.01)
co_r data
set\[ R_{i} = \frac{m+1}{N} \] * Where \(R_i\) is the recurrence interval (in years)
\(m\) is rank
\(N\) is the number of years in the data set
Our data right now shows the mean flow value for every day across
our timeframe
for the AMS-FDC, we want the single largest value for every year
in the dataset
In other words, we need to group the data by year and only keep the maximum value
Currently, co_r has a time column which
has the date information in it
There are many methods for sorting and grouping data which is in
a date-format (YYYY-MM-DD)
The method we are going to use is to make separate columns for
year month and day (they will be named y, m,
and d)
We can do this using handy functions that are a part of the
tidyverse package we already loaded
y, m, and d columnsmutate() function to add new
columnsmutate() adds a column but does not change the number
of rowsco_r |>
mutate(y = year(time))
y
for the yeartime
column, and extracting the value for year in that columntime column was
already in a date-formatco_r # note the <date> notation underneath time
# also note the <dbl> notation under Q. dbl is short for "double" which means it is a number format whic can contain decimals
co_r |>
pull(time) |>
class() # the output here says "Date"
## [1] "Date"
year() function and it doesn’t work, it
most likely means the data is in the wrong format.
month() and day() function
which will extract the relevant date information# overwrite co_r to have the date columns
co_r <- co_r |>
mutate(y = year(time),
m = month(time),
d = day(time))
# print out our new co_r to see what it looks like
co_r
time column so that we have the date
information in case we need it latery
column)co_r |>
group_by(y)
Note that the data looks the same, but there is now a
Groups: y[21] line at the top of the data output
this is saying that the data now contains groups based on values
in the y column, and there are 21 unique
groups
Now we summarize() the data to keep the single
largest value per group/year
The summarize() function reduces the data to have a
single row per group
# ams is short for annual maximum series
ams <- co_r |>
group_by(y) |>
summarize(high_flow = max(q))
# print out the new object
ams
ams object now has 21 rows, one per group/yearams_ri <- ams |>
# sort by the high_flow column
arrange(-high_flow) |>
# add a "rank" column
mutate(m_rank = 1:n()) |>
# calculate the return interval, and the probability
mutate(Ri = (n() + 1) / m_rank,
pr = 1 / Ri)
# print out the new data object
ams_ri
ams_ri |>
filter(Ri <= 15,
Ri >= 7)
So the 11-year flood has a Q of 38900 CFS, and the 7.3-year flood has a flow of 37200.
The 10-year flood must be less than 38900, and more than 37200, but what is the exact value?